This form is a web page which was created in MS WORD and therefore can be easily edited that way

Entry Name: VADER/VIS

VAST Challenge 2015
Mini-Challenge 1

Team Members:

Robert Krueger, VIS, University of Stuttgart, robert.krueger@vis.uni-stuttgart.de PRIMARY

Michael Steptoe, VADER Lab, Arizona State University, msteptoe@mainex1.su.edu

Rolando Garcia, VADER Lab, Arizona State University, rsgarci1@asu.edu

Sagarika Kadambi, VADER Lab, Arizona State University, skadambi@asu.edu

Thomas Ertl, VIS, University of Stuttgart, Thomas.ertl@vis.uni-stuttgart.de

Ross Maciejewski, VADER Lab, Arizona State University, rmacieje@asu.edu

Student Team: YES

Did you use data from both mini-challenges?
We did both mini/challenges, but for this solution (mc1) we only consider data from mc1.

Analytic Tools Used:

We developed our own tool, based on Java (Backend) and d3 (Frontend).
In the beginning, we took a quick look at the data with Tableau.

Approximately how many hours were spent working on this submission in total

~250 hours over one month between all the students

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

Video Download

Video:

https://youtu.be/DL6A__IWRn4

------------------------------------------------------------------------------------------------------------------------------------------

Questions

MC1.1 – Characterize the attendance at DinoFun World on this weekend. Describe up to twelve different types of groups at the park on this weekend.

a. How big is this type of group?

b. Where does this type of group like to go in the park?

c. How common is this type of group?

d. What are your other observations about this type of group?

e. What can you infer about this type of group?

f. If you were to make one improvement to the park to better meet this group’s needs, what would it be?

Limit your response to no more than 12 images and 1000 words.

Our approach to this was to first process the data to create patron trajectories defined by attraction check-in information around the park (71 locations including entrances) and then cluster patrons who had similar trajectories (meaning that they traveled around the park together). A large majority of attractions never posted check-in information, but one can infer check-in/out time based on how long a person lingered in the general area of an attraction. We created inferred check-ins as follows:

· If a person’s location remains within a distance, d, from attraction, a, for more than a temporal threshold, t, then we consider this to be an inferred check-in at a. We used d=5 pixels, t= 5 minutes (analysts can vary these parameters).

A spatiotemporal trajectory structure capturing check-ins was created. Our observation is that if users visit attractions in the same order at the same time, then they are likely traveling the park together. We aggregate the check-ins to five minute intervals (user adjustable). A spatiotemporal trajectory then consists of the location that a user last checked into. For example, if the user checks in at the park gate (attraction 84) and then took fifteen minutes to go to the Flying TyrAndrienkos(12), stayed there for 30 minutes and then went to Tricerastop(52) for 10 minutes, their trajectory would be:

· 84-84-84-12-12-12-12-12-12-52-52-….

For a full day at the park 8AM-12AM, we have a trajectory of length 192. We combine all three days together for our trajectory and a “not-in-park” value is stored when the user is out of the park. The trajectory structure can be adjusted to represent park regions (five park regions) and attraction categories (thrill ride, etc. – 10 total).

Our interface (above) consists of:

1) Visual queries, clustering and outliers

2) Calendar view

3) Sequence View - temporal check-in sequences per visitor or the most representative check-in sequence of a group

4) Map View - trajectories, heat-maps, animation

The main feature used for group exploration was the “Detect Group” option. This applies agglomerative hierarchical clustering using pre-computed Levenshtein distance matrixes for the trajectory sequences. Sequence comparison is a string comparison (e.g. ‘84’ from 8.55 to 9.00AM is not similar to '12' from 8.55 to 9.00AM). Normalization with the square root of the maximal length of the compared sequences is applied. A complete linkage strategy was used for final clustering and groups can be extracted via the tolerance control widget (values of 0-7 seemed best).

Once the clustering is done, patrons now belong to local-groups (i.e., patrons who travel the park together). These local-groups can range in size from 2-44 visitors. We detected ~2300 such local-groups and an average movement sequence based on the maximal occurrence of a visit at the current time slice is computed for visualization purposes.

To find group types (groups that have similar interests within the park but do not necessarily travel together) we create a feature vector for each group (number of thrill rides visited, etc.) and cluster each local-group using k-means. This gives us group types. We also support a view to explore feature vectors of individuals, local-groups and group types (below).

Groups identified include:

1. Tours

a) 33-44 patrons per local-group

b) Visits lots of the park but few kid rides.

c) ~32 local-groups (out of 2300)

d) Beer garden visit between nearly every ride.

e) Likely comes via bus, little interest in the shows and pavilions.

f) Volume ticket purchase discount.

2. Smaller adult groups

a) 2-11 patrons per local-group

b) Mostly do thrill rides.

c) ~100 local-groups (out of 2300)

d) Do not use parks overnight accomodations.

e) Groups are probably private arrangements of friends/colleagues.

f) Adding new rollercoasters could increase return visits.

3. Stage group

a) 8 patrons per local-group

b) Goes to the stage, for every show.

c) 1 local-group (out of 2300)

d) They don’t check in and arrive ~5 minutes before the shows start.

e) Probably the soccer star’s staff or security for the show.

f) For star visits, a private entrance nearer the stage could be important.

4. Half-Day patrons

a) 2-7 patrons per local-group

b) They like short visits.

c) ~270 local-groups (out of 2300)

d) They either come in the early morning or after lunch.

e) They are mainly in the park for a few main attractions.

f) The park owners should offer half day admission fees.

The screenshot shows heat maps of different groups within this group types as small multiples. Visitors within these groups go to very similar rides.

5. Foodies

a) 2-11 patrons per local-group

b) They like food (2 to 5 hours at restaurants)

c) ~250 local-groups (out of 2300)

d) This group stays at food places 2-5 hours.

e) This group is only active in the morning.

f) Offer an eating pass that includes samples at various park restaurants.

6. Shoppers

a) 2-11 patrons per local-group.

b) Finish their day with several hours of shopping.

c) ~500 local-groups (out of 2300)

d) Always buy lunch at park too.

e) Big spenders at the park.

f) Offer a buy x get y free deal.

(1) Typical shopping group (shops mainly before leaving the park, with short breaks). (2) By filtering for groups that shop often we can compare their features in a small multiple view (3).

7. Nappers

a) ~1-6 patrons per local-group.

b) Leaves the park near lunchtime

c) ~50 local-groups (out of 2300)

d) May be cheaper group

e) Leaves the park either for different/cheaper food, to take a rest.

f) Offer cheaper lunch options.

8. The Non-Check-In Group

a) ~1 patron per local-group

b) Either their tracking devices are erroneous, or they are not normal visitors (i.e. staff).

c) ~70 local-groups (out of 2300)

d) When we use our method to infer check-ins these sequences are quite short and contain few rides.

e) Technologically challenged.

f) Devices should be more stable.

MC1.2 – Are there notable differences in the patterns of activity on in the park across the three days? Please describe the notable difference you see.

Limit your response to no more than 3 images and 300 words.

For exploring patterns of activity in the park we employed a traditional calendar view as well as explored how a new “probability view” could explore where and how patrons travel around the park.

The calendar view shows the number of patrons checked-in to a ride aggregated at half-hour intervals. You can view the counts for Friday, Saturday, Sunday, and all three days in the “Any Day” view. The “Every Day” view shows ids that were at the same location at the same time “Every Day” but was not used here. The following image shows daily routines and anomalies from the calendar view.

1) Entries/exits are most busy during opening/closing hours.

2) Restaurants are empty until noon.

3) Shops are most busy in the evening hours.

4) There are two shows a day at the Grinosaurus stage. However, on Sunday, there is only one show in the morning. This might be related to the vandalism and stolen medals. In general, the pavilion is not visited during the shows (probably closed) but busy all other times. On Sunday the pavilion is also empty in the afternoon. We hypothesize that during the first show on Sunday the medals get stolen and there is vandalism in the pavilion. Then the pavilion is shut down for the rest of the day.

5) Several attractions breakdown during the weekend, for example TryAndrienkos is empty on Sunday for 30 minutes.

6) All movements/check-ins after 8:30PM on Friday are missing.

The probability view shows the most likely attraction a visitor will attend next. The above figure shows slight difference between the three days. We used this to try and see if there was an obvious Markov Model that fit the data, but data showed that visitors were most likely to always go to thrill rides next.

MC1.3 – What anomalies or unusual patterns do you see? Describe no more than 10 anomalies, and prioritize those unusual patterns that you think are most likely to be relevant to the crime.

Limit your response to no more than 10 images and 500 words.

In order to explore anomalies and unusual patterns of movement related to the crime, we have developed a visual query interface based on Boolean operations that can find patrons that visit specific locations at specific times of the day.

To do this, first a Boolean operation can be selected (e.g. or) (1). Using this operator the analyst can select time and location cells from the calendar view. These selections will constrain the query (2). In this example the user queries for all users that never went to the soccer star’s stage performance. This is done in in Any Day view. Finally this query can be executed (3) and will result in a number of trajectories that can be investigated in the sequence view.

Using the visual query tool, we for example can query for all users that went to Creighton Pavilion during the show at the Grinosaurus stage, when it was supposed to be closed (1,2).

While most of the sequences are identical (indicating that this is probably a group that visit the park together) there are few sequences that are different (2, in the red oval). We also note that without our approach to infer visits there is just a single check-in in the park around 9:30AM when the Pavillion should be closed (3, left) which means that all others found (43 visitors) were there but did not check-in. They might be staff or could have entered the building without permission. Querying for similar sequences to the suspicious person reveals two nearly identical sequences (3, right – dashed rectangle). The first one, however, has an additional check-in at 9:30 am.

On Sunday another visitor (id: 1983765 ) first goes to Pavilion, then spends 2 hours at the Scholtz express (attraction 20), and exits park at 11:45 AM. Crime is discovered between 11:30 and 12:00. Could he have committed the crime between 9:00 and 9:10? The pavilion is closed after 9:00 but by 9:30. Security possibly failed to notice the crime until re-opening.

Along with the visual query tool, we have also created a method for detecting outliers in the trajectory sequences. For the outlier detection we again use this distance matrix. Here we simply sum up all distances for each sequence and order them descendingly. The sequences which are the farthest from any other sequence can be considered as outliers. The analyst may query for the n-most anomalous sequences. Outlier detections (1) shows most different sequences (compared to all others). Here we find sequences (2) that are very short, only contain a park entry but no check-ins (especially when we do not consider inferred check-ins) and sequences without any ride or thrill ride visit.

Entry Name: VADER/VIS

VAST Challenge 2015 Mini-Challenge 1

Team Members:

Analytic Tools Used:

VAST Challenge 2015
Mini-Challenge 1